Skip to content

fix(citations): drop pre-publication citation buckets#339

Merged
neuromechanist merged 2 commits into
developfrom
feature/citations-no-prepublication
Jun 10, 2026
Merged

fix(citations): drop pre-publication citation buckets#339
neuromechanist merged 2 commits into
developfrom
feature/citations-no-prepublication

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Problem

OpenAlex citing works occasionally carry incorrect publication years, producing citation counts dated before the cited paper existed. The BIDS 2016 paper, for example, showed a 2013 bucket — impossible. This applies both per paper and across a version group.

Fix

Floor each canonical paper's per-year histogram at its earliest version publication year (the preprint, when one is aliased). Counts in years before that are dropped.

  • resolve_work now returns the work id and publication_year (ResolvedWork).
  • sync_citing_papers takes min(publication_year) across the resolved version group and filters counts to year >= floor. The empty-counts guard still protects existing data.

LSL (preprint 2024 / published 2025) is unaffected (its citations are 2024+); BIDS 2016 loses its spurious pre-2016 buckets.

Test plan

  • OpenAlex client: resolve_work returns id + year; missing year -> None; 404 -> None.
  • Sync: a 2013 bucket for a 2016 paper is dropped; for a group, the floor is the earliest version year (preprint 2024 over published 2025, so 2023 is dropped, 2024+ kept).
  • Regression: citations endpoint, stats, papers_sync — green.

Deploy follow-up

Re-run sync papers --community {eeglab,bids} --citations on dev and prod to refresh the floored histograms.

OpenAlex citing works sometimes carry bad publication years, producing
citation counts in years before the paper existed (e.g. the BIDS 2016 paper
showed a 2013 bucket). Floor each canonical paper's histogram at its earliest
version publication year.

- resolve_work returns the work id AND publication_year (ResolvedWork).
- sync_citing_papers takes the minimum publication year across the version
  group (the preprint, when present) and drops any count in earlier years.

Tests: pre-publication bucket dropped; floor uses the earliest version year.
- Use 'is not None' for publication_year filter (don't drop a year-0 edge).
- Test that an over-high floor (bogus future year) leaves existing counts
  intact via the empty-counts guard rather than wiping them.
@neuromechanist neuromechanist merged commit 1a9c7af into develop Jun 10, 2026
6 checks passed
@neuromechanist neuromechanist deleted the feature/citations-no-prepublication branch June 10, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant